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O . Abstract 

(N 

This work considers computationally efficient privacy-preserving data release. We study the task of 
^ I analyzing a database containing sensitive information about individual participants. Given a set of statis- 

tical queries on the data, we want to release approximate answers to the queries while also guaranteeing 
f^ I differential privacy — protecting each participant's sensitive data. 

Our focus is on computationally efficient data release algorithms; we seek algorithms whose running 
time is polynomial, or at least sub-exponential, in the data dimensionality. Our primary contribution is a 
^^ ■ computationally efficient reduction from differentially private data release for a class of counting queries, 

rj I to learning thresholded sums of predicates from a related class. 

We instantiate this general reduction with a variety of algorithms for learning thresholds. These 
Q I instantiations yield several new results for differentially private data release. As two examples, taking 

{0, 1 )'' to be the data domain (of dimension d), we obtain differentially private algorithms for: 



1. Releasing all A:- way conjunction counting queries (or A:-way contingency tables). For any given k, 
the resulting data release algorithm has bounded error as long as the database is of size at least 

'sj" . ^o(-\Jk\og{k\ogd)) (ignoring the dependence on other parameters). The running time is polynomial 

'^ ' in the database size. The best sub-exponential time algorithms known prior to our work required a 

database of size d(d^^^) [D work Mc Sherry Nissim and Smith 2006]. 

2. Releasing a (1 - 7)-fraction of all 2'' parity counting queries. For any y > poly(l/ii), the algorithm 
has bounded error as long as the database is of size at least poly((i) (again ignoring the dependence 
on other parameters). The running time is polynomial in the database size. 
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Several other instantiations yield further results for privacy-preserving data release. Of the two results 
K^ , highlighted above, the first learning algorithm uses techniques for representing thresholded sums of 

^ ' predicates as low-degree polynomial threshold functions. The second learning algorithm is based on 

Jackson's Harmonic Sieve algorithm [Jackson 1997]. It utiUzes Fourier analysis of the database viewed 
as a function mapping queries to answers. 



C^ 



'Center for Computational Intractability, Department of Computer Science, Princeton University. Supported by NSF grants 
CCF-0426582 and CCF-0832797. Email: mhardttacs. princeton.edu. 

'Microsoft Research, Silicon Valley Campus. Most of this work was done while the author was at the Department of Computer 
Science at Princeton University and Supported by NSF Grant CCF-0832797 and by a Computing Innovation Fellowship. Email: 
rothbluinOalum . mi t . edu. 

* Columbia University Department of Computer Science and Center for Computational Intractability, Department of Com- 
puter Science, Princeton University. Supported by NSF grants CCF-0832797, CNS-07- 16245 and CCF-09 15929. Email: 
roccoOcs . Columbia . edu 



1 Introduction 

This work considers privacy-preserving statistical analysis of sensitive data. In this setting, we wish to ex- 
tract statistics from a database D that contains information about n individual participants. Each individual's 
data is a record in the data domain 14. We focus here on the offline (or non-interactive) setting, in which 
the information to be extracted is specified by a set Q of statistical queries. Each query ^ € (3 is a function 
mapping the database to a query answer, where in this work we focus on real-valued queries with range 
[0, 1]. Our goal is data release: Extracting approximate answers to all the queries in the query set Q. 

An important concern in this setting is protecting the privacy of individuals whose sensitive data (e.g. 
medical or financial records) are being analyzed. Differential privacy [DMNS06] provides a rigorous notion 
of privacy protection, guaranteeing that each individual only has a small effect on the data release algorithm's 
output. A growing body of work explores the possibility of extracting rich statistics in a differentially private 
manner. One hne of research [BLR08, DNR+09, DRV 10, RRIO, HRIO] has shown that differential privacy 
often permits surprisingly accurate statistics. These works put forward general algorithms and techniques 
for differentially private data analysis, but the algorithms have running time that is (at least) exponential in 
the dimensionality of the data domain. Thus, a central question in differentially private data analysis is to 
develop general techniques and algorithms that are efficient, i.e. with running time that is polynomial (or 
at least sub-exponential) in the data dimensionality. While some computational hardness results are known 
[DNR^09, UVl 1, GHRUl 1], they apply only to restricted classes of data release algorithms. 

This Work. Our primary contribution is a computationally efficient new tool for privacy-preserving data 
release: a general reduction to the task of learning thresholds of sums of predicates. The class of predicates 
(for learning) in our reduction is derived directly from the class of queries (for data release). 

At a high level, we draw a connection between data release and learning as follows. In the data release 
setting, one can view the database as a function: it maps queries in Q to answers in [0, 1]. The data release 
goal is approximating this function on queries/examples in Q. The challenge is doing so with only bounded 
access to the database/function; in particular, we only allow access that preserves differential privacy. For 
example, this often means that we only get a bounded number of oracle queries to the database function 
with noisy answers. 

At this high level there is a striking similarity to learning theory, where a standard goal is to efficiently 
learn/approximate a function given limited access to it, e.g. a bounded number of labeled examples or oracle 
queries. Thus a natural approach to data release is learning the database function using a computational 
learning algorithm. 

While the approach is intuitively appealing at this high level, it faces immediate obstacles because of 
apparent incompatibilities between the requirements of learning algorithms and the type of "limited" access 
to data that are imposed by private data release. For example, in the data release setting a standard technique 
for ensuring differential privacy is adding noise, but many efficient learning algorithms fail badly when run 
on noisy data. As another example, for private data release, the number of (noisy) database accesses is often 
very restricted: e.g sub-linear, or at most quadratic in the database size. In the learning setting, on the other 
hand, it is almost always the case that the number of examples or oracle queries required to learn a function 
is lower bounded by its description length (and is often a large polynomial in the description length). 

Our work explores the connection between learning and private data release. We 

(i) give an efficient reduction that shows that, in fact, a general class of data release tasks can be reduced 
to related and natural computational learning tasks; and 

(ii) instantiate this general reduction using new and known learning algorithms to obtain new computa- 
tionally efficient differentially private data release algorithms. 



Before giving more details on our reduction in Section 1.1, we briefly discuss its context and some of the 
ways that we apply/instantiate it. While the search for efficient differentially private data release algorithms 
is relatively new, there are decades of work in learning theory aimed at developing techniques and algorithms 
for computationally efficient learning, going back to the early work of Valiant [Val84]. Given the high-level 
similarity between the two fields, leveraging the existing body of work and insights from learning theory for 
data release is a promising direction for future research; we view our reduction as a step in this direction. 
We note that our work is by no means the first to draw a connection between privacy-preserving data release 
and learning theory; as discussed in the "Related Work" section below, several prior works used leai^ning 
techniques in the data release setting. A novelty in our work is that it gives an explicit and modular reduction 
from data release to natural learning problems. Conceptually, our reduction overcomes two main hurdles: 

- bridging the gap between the noisy oracle access arising in private data release and the noise-free 
oracle access required by many learning algorithms (including the ones we use). 

- avoiding any dependence on the database size in the complexity of the learning algorithm being used. 

We use this reduction to construct new data release algorithms. In this work we explore two main appli- 
cations of our reduction. The first aims to answer boolean conjunction queries (also known as contingency 
tables or marginal queries), one of the most well-motivated and widely-studied classes of statistical queries 
in the differential privacy literature. Taking the data universe li to be {0, l)"^', the ^-way boolean conjunction 
corresponding to a subset Sofk attributes in [J] counts what fraction of items in the database have all the 
attributes in S set to 1. Approximating the answers for k-vjuy conjunctions (or all conjunctions) has been the 
focus of several past works (see, e.g. [BCD^07, KRSUIO, UVll, GHRUl 1]). Applying our reduction with 
a new learning algorithm tailored for this class, we obtain a data release algorithm that, for databases of size 
jO( ^jk\og{k\ogd)) ^ releases accurate answers to all ^-way conjunctions simultaneously (we ignore for now the 
dependence of the database size on other parameters such as the error). The running time is poly(J*). Pre- 
vious algorithms either had running time 2^^''^ (e.g. [DNR^09]) or required a database of size d'^^'^ (adding 
independent noise [DMNS06]). We also obtain better bounds for the task of approximating the answers 
to a large fraction of all (i.e. t/-way) conjunctions under arbitrary distributions. These results follow from 
algorithms for learning thresholds of sums of the relevant predicates; we base these algorithms on learning 
theory techniques for representing such functions as low-degree polynomial threshold functions, following 
works such as [KS04, KOS04]. We give an overview of these results in Section 1.2 below. 

Our second application uses Fourier analysis of the database (viewed, again, as a real-valued function 
on the queries in Q). We obtain new polynomial and quasi-polynomial data release algorithms for parity 
counting queries and low-depth (AC°) counting queries respectively. The learning algorithms we use for 
this are (respectively) Jackson's Harmonic Sieve algorithm [Jac97], and an algorithm for learning Majority- 
of-AC" circuits due to Jackson et al. [JKS02]. We elaborate on these results in Section 1.3 below. 

1.1 Private Data Release Reduces to Learning Thresholds 

In this section we give more details on the reduction from privacy-preserving data release to learning thresh- 
olds. The full details are in Sections 3 and 4. We begin with loose definitions of the data release and learning 
tasks we consider, and then proceed with (a simple case of) our reduction. 

Counting Queries, Data Release and Learning Thresholds. We begin with preliminaries and an informal 
specification of the data release and learning tasks we consider in our reduction (see Sections 2 and 3.1 for 
full definitions). We refer to an element u in data domain 14 as an item. A database is a collection of n 
items from tl. A counting query is specified by a predicate /? : 1/ — > {0, 1), and the query qp on database 
D outputs the fraction of items in D that satisfy p, i.e. ^ Y!i=\ piPi)- A class of counting queries is specified 
by a set Q of query descriptions and a predicate P: Qxll -^ {0, 1). For a query q e Q, its corresponding 



predicate is P{q, •) : tl ^ {0, 1}. We will sometimes fix a data item u e li and consider the predicate 
Pu{-) = P{;u):Q^{Q,l]. 

Fix a data domain li and query class Q (specified by a predicate P). A data release algorithm ^ gets as 
input a database D, and outputs a synopsis S : <3 ^ [0, 1] that provides approximate answers to queries in 
Q. We say that ^ is an (Q',yS, y) distribution-free data release algorithm for Ci/, Q, P) if, for any distribution 
G over the query set Q, with probability 1 - y6 over the algorithm's coins, the synopsis S satisfies that with 
probability 1 - y over q ~ G, the (additive) error of 5 on ^ is bounded by a. Later we will also consider 
data release algorithms that only work for a specific distribution or class of distributions (in this case we 
will not call the algorithm distribution-free). Finally, we assume for now that the data release algorithm only 
accesses the distribution G by sampling queries from it, but later we will also consider more general types 
of access (see below). A differentially private data release algorithm is one whose output distribution (on 
synopses) is differentially private as per Definition 2.1. See Definition 3.3 for full and formal details. 

Fix a class Q of examples and a set J^ of predicates on Q. Let f^^t be the set of thresholded sums from T, 
i.e., the set of functions of the form / = 1 1 ^ Y!i=\ fi^ t\, where f ^T for all 1 < / < «. We refer to functions 
in T„j as n-thresholds. An algorithm for learning thresholds gets access to a function in f^f and outputs 
a hypothesis /j : (3 — > {0, 1) that labels examples in Q. We say that it is a (y,jS) distribution-free learning 
algorithm for learning thresholds over {Q, !F) if, for any distribution G over the set Q, with probability 1-/3 
over the algorithm's coins the output hypothesis h satisfies that with probability 1 - y over q ~ G, h labels 
q correctly. As above, later we will also consider learning algorithms that are not distribution free, and only 
work for a specific distribution or class or distributions. For now, we assume that the learning algorithm 
only accesses the distribution G by drawing examples from it. These examples are labeled using the tai^get 
function that the algorithm is trying to learn. See Definition 3.5 for full and formal details. 

The Reduction. We can now describe (a simple case of) our reduction from differentially private data 
release to learning thresholds. For any data domain tl, set Q of query descriptions, and predicate P : 
Qxli — > {0, 1}, the reduction shows how to construct a (distribution free) data release algorithm given a 
(distribution free) algorithm for learning thresholds over (<3, {p„ : u € tl}), i.e., any algorithm for learning 
thresholds where Q is the example set and the set "F of predicates (over Q) is obtained by the possible ways 
of fixing the w-input to P. The resulting data release algorithm is (a,/?, y)-accurate as long as the database 
is not too small; the size bound depends on the desired accuracy parameters and on the learning algorithm's 
sample complexity. The efficiency of the learning algorithm is preserved (up to mild polynomial factors). 

Theorem 1.1 (Reduction from Data Release to Learning Thresholds, Simplified). Let tl be a data universe, 
Q a set of query descriptions, and P. Qxtl — > {0, 1) a predicate. There is an s-differentially private 
{a, 13, y)-accurate distribution free data-release algorithm for (1/, Q, P), provided that: 

1. there is a distribution-free learning algorithm X. that (y,/3)-leams thresholds over (Q, {pi, : u € tl]) 
using b{n, y,/3) labeled examples and running time t{n, y,fi)for learning n-thresholds. 

2_ ,^ ^ CMn'.y'fiy^^giMP) ^ ^^^^^ ^, ^ Q^jQg IQI/a^), p' =@(fi- a), y' = 0(y ■ a), C = 0(1). 

Moreover, the data release algorithm only accesses the query distribution by sampling. The number of 
samples taken is 0(b(n',y',/3') • log(l/yS)/y) and the running time is po\y{t{n' ,y' ,/3'),n, l/a,log(l//3), 1/y). 

Section 3.2 gives a formal (and more general) statement in Theorem 3.9. Section 3.3 gives a proof 
overview, and Section 4 gives the full proof. Note that, since the data release algorithm we obtain from this 
reduction is distribution free (i.e. works for any distribution on the query set) and only accesses the query 
distribution by sampling, it can be boosted to yield accurate answers on all the queries [DRV 10]. 



A More General Reduction. For clarity of exposition, we gave above a simplified form of the reduction. 
This assumed that the learning algorithm is distribution-free (i.e. works for any distribution over exam- 
ples) and only requires sampling access to labeled examples. These strong assumptions enable us to get a 
distribution-free data release algorithm that only accesses the query distribution by sampling. 

We also give a reduction that applies even to distribution-specific learning algorithms that require (a 
certain kind of) oracle access to the function being learned. In addition to sampling labeled examples, the 
learning algorithm can: (/) estimate the distribution G on any example q by querying q and receiving a 
(multiplicative) approximation to the probability G{q\; and (//) query an oracle for the function / being 
learned on any q such that G[g] i= 0. We refer to this as approximate distribution restricted oracle access, 
see Definition 3.6. Note that several natural learning algorithms in the literature use oracle queries in this 
way; in particular, we show that this is true for Jackson's Harmonic Sieve Algorithm [Jac97], see Section 6. 

Our general reduction gives a data release algorithm for a class QQ of distributions on the query set, 
provided we have a learning algorithm which can also use approximate distribution restricted oracle access, 
and which works for a slightly richer class of distributions QQ' (a smooth extension, see Definition 3.8). 
Again, several such algorithms (based on Fourier analysis) are known in the literature; our general reduction 
allows us to use them and obtain the new data release results outlined in Section 1.3. 

Related Work: Privacy and Learning. Our new reduction adds to the fruitful and growing interaction 
between the fields of diff"erentially private data release and learning theory. Prior works also explored this 
connection. In our work, we "import" learning theory techniques by drawing a con^espondence between 
the database (in the data release setting), for which we want to approximate query answers, and the target 
function (in the learning setting) which labels examples. Several other works have used this correspondence 
(implicitly or explicitly), e.g. [DNR+09, DRVIO, GHRUl 1]. A different view, in which queries in the data 
release setting correspond to concepts in learning theory, was used in [BLR08] and also in [GHRUl 1]. 

There is also work on differentially private learning algorithms in which the goal is to give differentially 
private variants of various learning algorithms [BDMN05, KLN''"08]. 

1.2 Applications (Part I): Releasing Conjunctions 

We use the reduction of Theorem 1.1 to obtain new data release algorithms "automatically" from learning 
algorithms that satisfy the theorem's requirements. Here we describe the distribution-free data release algo- 
rithms we obtain for approximating conjunction counting queries. These use learning algorithms (which are 
themselves distribution-free and require only random examples) based on polynomial threshold functions. 

Throughout this section we fix the query class under consideration to be conjunctions. We take tl = 
{0, 1)^, and a (monotone) conjunction q e Q = {0, l}^ is satisfied by u iff V/ s.t. qt - 1 it is also the case 
that Ui = I. (Our monotone conjunction results extend easily to general non-monotone conjunctions with 
parameters unchanged.^) Our first result is an algorithm for releasing ^-way conjunctions: 

Theorem 1.2 (Distribution-Free Data Release for k-v/ay conjunctions). There is an s-dijferentially private 
{a,/3,y)-accurate distribution-free data release algorithm, which accesses the query distribution only by 
sampling, for the class ofk-way monotone Boolean conjunction queries. The algorithm has runtime poly(n) 
on databases of size n provided that 

say^ 



'To see this, extend the data domain to be jO, Ip'', and for each item in the original domain include also its negation. General 
conjunctions in the original data domain can now be treated as monotone conjunctions in the new data domain. Note that the locality 
of a conjunction is unchanged. Our results in this section are for arbitrary distributions over the set of monotone conjunctions (over 
the new domain), and so they will continue to apply to arbitrary distributions on general conjunctions over the original data domain. 



Since this is a distribution-free data release algorithm that only accesses the query distribution by sam- 
pling, we can use the boosting results of [DRV 10] and obtain a data release algorithm that generates (w.h.p.) 
a synopsis that is accurate for all queries. This increases the running time to d'^ ■ poly(«) (because the boost- 
ing algorithm needs to enumerate over all the ^-way conjunctions). The required bound on the database size 
increases slightly but our big-Oh notation hides this small increase. The corollary is stated formally below: 

Corollary 1.3 (Boosted Data Release for k-way Conjunctions). There is an s-differentially private (a,p, y = 
0)-accurate distribution-free data release algorithm for the class of k-way monotone Boolean conjunction 
queries with runtime d ■ poly(n) on databases of size n, provided that 

\ sa 

We also obtain a new data release algorithm for releasing the answers to all conjunctions: 

Theorem 1.4 (Distribution-Free Data Release for All Conjunctions). There is an e-dijferentially private 
{a,/3,y)-accurate distribution-free data release algorithm, which accesses the query distribution only by 
sampling, for the class of all monotone Boolean conjunction queries. The algorithm has runtime poly(«) on 
databases of size n, provided that 

\ say^ 

Again, we can apply boosting to this result; this gives improvements over previous work for a certain 
range of parameters (roughly k e [d^^^ ,d^^^'\). We omit the details. 

Related Work on Releasing Conjunctions. Several past works have considered differentially private data 
release for conjunctions and /:-way conjunctions (also known as marginals and contingency tables). As a 
corollary of their more general Laplace and Gaussian mechanisms, the work of Dwork et al. [DMNS06] 
showed how to release all /:-way conjunctions in running time d^^'^^ provided that the database size is at least 
jO(/:) garak et al. [BCD+07] showed how to release consistent contingency tables with similar database 
size bounds. The running time, however, was increased to exp(fif). We note that our data-release algorithms 
do not guarantee consistency. Gupta et al. gave distribution-specific data release algorithm for ^-way and for 
all conjunctions. These algorithms work for the uniform distribution over (^-way or general) conjunctions. 
The database size bound and running time were (roughly) d^^^^"'. For distribution-specific data release on 
the uniform distribution, the dependence on d in their work is better than our algorithms but the dependence 
on a is worse. Finally, we note that the general information-theoretic algorithms for differentially private 
data release also yield algorithms for the specific case of conjunctions. These algorithms are (significantly) 
more computationally expensive, but they have better database size bounds. For example, the algorithm 
of [HRIO] has running time exp(J) but database size bound is (roughly) 0{dja^) (for the relaxed notion of 
(e, (5)-differential privacy). 

In terms of negative results, UUman and Vadhan [UVl 1] showed that, under mild cryptographic assump- 
tions, no data release algorithm for conjunctions (even 2-way) can output a synthetic database in running 
time less than exp(J) (this holds even for distribution-specific data release on the uniform distribution). Our 
results side-step this negative result because the algorithms do not release a synthetic database. 

Kasiviswanathan et al. [KRSUIO] showed a lower bound of Q. (min Id'^l^ la, l/o'^}) on the database size 
needed for releasing ^-way conjunctions. To see that this is consistent with our bounds, note that our bound 
on n is always larger than f{a) = 2'v*^'°g('/'')/a. We have /(a) < l/a^ only if ^ < log(l/a). But in the range 
where k < log(l/Q') our theorem needs n to be larger than d^/a which is consistent with the lower bound. 



1.3 Applications (Part II): Fourier-Based Approach 

We also use Theorem 1.1 (in its more general formulation given in Section 3.2) to obtain new data release 
algorithms for answering parity counting queries (in polynomial time) and general AC*' counting queries 
(in quasi -polynomial time). For both of these we fix the data universe to be 1/ = {0, l)*^', and take the set 
of query descriptions to also be <3 = {0, l)'' (with different semantics for queries in the two cases). Both 
algorithms are distribution-specific, working for the uniform distribution over query descriptions,^ and both 
instantiate the reduction with learning algorithms that use Fourier analysis of the target function. Thus the 
full data release algorithms use Fourier analysis of the database (viewed as a function on queries). 

Parity Counting Queries. Here we consider counting queries that, for a fixed q e {0, l)"^', output how 
many items in the database have inner product 1 with q (inner products are taken over GF[2]). I.e., we use 
the parity predicate P{q,u) - YjiQi " ";' (modi). We obtain a polynomial-time data release algorithm for 
this class (w.r.t. the uniform distribution over queries). This uses our reduction, instantiated with Jackson's 
Harmonic Sieve learning algorithm [Jac97]. In Section 6 we prove: 

Theorem 1.5 (Uniform Distribution Data Release for Parity Counting Queries.). There is an s-differentially 
private algorithm for releasing the class of parity queries over the uniform distribution on Q. For databases 
of size n, the algorithm has runtime poly(?2) and is {a,/3,y)-accurate, provided that 

^poly(J,iA,V7,log(V/3)) 
n ^ . 

E 

AC° Counting Queries. We also consider a quite general class of counting queries, namely, any query 
family whose predicate is computed by a constant depth (AC*') circuit. For any family of this type, in 
Section 6 we obtain a data release algorithm over the uniform distribution that requires a database of quasi- 
polynomial (in d) size (and has running time polynomial in the database size, or quasi-polynomial in d). 

Theorem 1.6 (Uniform Distribution Data Release for AC*' Counting Queries). Take tl = Q = {0, l}^, and 
P(q, u) : Qxtl ^> {0,1} a predicate computed by a Boolean circuit of depth £ = 0(1) and size poly((i). 
There is an s-differentially private data release algorithm for this query class over the uniform distribution 
on Q. For databases of size n, the algorithm has runtime poly(?i) and is {a,/3,y)-accurate, provided that: 



\ sa^y I 



n^ d - - ^1 ^ I 

ea^y J 

This result uses our reduction instantiated with an algorithm of Jackson et al. [JKS02] for learning 
Majority-of-AC*' circuits. To the best of our knowledge, this is the first positive result for private data release 
that uses the (circuit) structure of the query class in a "non black-box" way to approximate the query answer. 
We note that the class of AC*' predicates is quite rich. For example, it includes conjunctions, approximate 
counting [Ajt83], and GF[2] polynomials with polylog((i) many terms. While our result is specific to the 
uniform distribution over Q, we note that some query sets (and query descriptions) may be amenable to 
random self-reducibility, where an algorithm providing accurate answers to uniformly random q e Q can be 
used to get (w.h.p.) accurate answers to any q € Q. We also note that Theorem 1.6 leaves a large degree of 
freedom in how a class of counting queries is to be represented. Many different sets of query descriptions 
Q and predicates P{q, u) can correspond to the same set of counting queries over the same tl, and it may 
well be the case that some representations are more amenable to computations in AC*' and/or random self- 
reducibility. Finally, we note that the hardness results of Dwork et al. [DNR+09] actually considered (and 



^More generally, we can get results for smooth distributions, we defer these to the full version. 



ruled out) efficient data-release algorithms for AC° counting queries (even for the uniform distribution case), 
but only when the algorithm's output is a synthetic database. Theorem 1.6 side-steps these negative results 
because the output is not a synthetic database. 

2 Preliminaries 

Data sets and differential privacy. We consider a data universe li, where throughout this work we take 
'ZY = {0, 1)''. We typically refer to an element m e 'W as an item. A data set (or database) D of size n over the 
universe 1i is an ordered multiset consisting of n items from 1i. We will sometimes think of D as a tuple in 
1i" . We use the notation \D\ to denote the size of Z) (here, n). Two data sets D,D' are called adjacent if they 
ai^e both of size n and they agree in at least n- \ items (i.e., their edit distance is at most 1). 

We will be interested in randomized algorithms that map data sets into some abstract range H and satisfy 
the notion of differential privacy. 

Definition 2.1 (Differential Privacy [DMNS06]). A randomized algorithm M mapping data sets over li to 
outcomes in H satisfies (e, 6)-dijferential privacy if for all 5 c "R and every pair of two adjacent databases 
D,D', we have P(At(D) e 5) < e'^ViMiD') e 5) -i- 5. If 5 ^ 0, we say the algorithm satisfies e-dijferential 
privacy. 

Counting queries. A class of counting queries is specified by a predicate P.QxIi -^ {0, 1) where Q 
is a set of query descriptions. Each q e Q specifies a query and the answer for a query ^ e <3 on a single 
data item u € tl is given by P{q, u). The answer of a counting query ^ € Q on a data set D is defined as 

We will often fix a data item u and database D e tl" of n data items, and use the following notation: 

def 

- pu : <3 — > {0, 1), Puiq) = P{q, u). The predicate on a fixed data item u. 

- f^: Q ^ [0, 1], f'^iq) = \ YjueD Pi^' '^)- For an input query description and fixed database, counts 
the fraction of database items that satisfy that query. 

- ff: Q — > {0, 1), ffiq) ^ llf^iq) > ?}. For an input query description and fixed database and 
threshold t e [0, 1], indicates whether the fraction of database items that satisfy that query is at least t. 
Here and in the following I denotes the 0/1 -indicator function. 

We close this section with some concrete examples of query classes that we will consider. Fix tl - 
{0, \\'^ and Q = {0, l]'^. The query class of monotone boolean conjunctions is defined by the predicate 
P{q,u) = /\i. ^_i Ui . Note that we may equivalently write P{q,u) = 1 - Vi: Uj=o^i- The query class of 
parities over {0, l)'^ is defined by the predicate P{q, u) - Yii\Ui=i <ii (mod 2) . 

3 Private Data Release via Learning Thresholds 

In this section we describe our reduction from private data release to a related computational learning task of 
learning thresholded sums. Section 3. 1 sets the stage, first introducing definitions for handling distributions 
and access to an oracle, and then proceeds with notation and formal definitions of (non-interactive) data re- 
lease and of learning threshold functions. Section 3.2 formally states our main theorem giving the reduction, 
and Section 3.3 gives an intuitive overview of the proof. The formal proof is then given in Section 4. 



3.1 Distribution access, data release, learning thresholds 

Definition 3.1 (Sampling or Evaluation Access to a Distribution). Let G be a distribution over a set Q. When 
we give an algorithm Jl sampling access to G, we mean that Jl is allowed to sample items distributed by 
G. When we give an algorithm ^ evaluation access to G, we mean that ^ is both allowed to sample items 
distributed by G and also to make oracle queries: in such a query ^ specifies any q € Q and receives back 
the probability G[q] e [0, 1] of q under G. For both types of access we will often measure ^'s sample 
complexity or number of queries (for the case of evaluation access).^ 

Definition 3.2 (Sampling Access to Labeled Examples). Let G be a distribution over a set Q of potential 
examples, and let / be a function whose domain is Q. When we give an algorithm J?l sampling access to 
labeled examples by {G,f), we mean that ^ has sampling access to the distribution {q,f{q))q^G- 

Definition 3.3 (Data Release Algorithm). Fix 1/ to be a data universe, Q to be a set of query descriptions, 
QQ to be a set of distributions on Q, and P{q, u) : Qxtl ^ {0, 1 ) to be a predicate. A CZY, Q, QQ, P) data 
release algorithm J?l is a (probabilistic) algorithm that gets sampling access to a distribution G e QQ and 
takes as input accuracy pai^ameters a,^,y > 0, a database size n, and a database D e tt". J{ outputs a 
synopsis S : <3 ^ [0, 1]. 

We say that ^ is {a,/3,y)-accurate for databases of size n, if for every database D e tl" and query 
distribution G e QQ: 



P 



P \\S{q)-f{q)\>a\>y 

q~G ■- -■ 



<li (1) 



We also consider data release algorithms that get evaluation access to G. In this case, we say that y{ 
is a data release algorithm using evaluation access. The definition is unchanged, except that ^ gets this 
additional form of access to G. 

When P and 14 are understood from the context, we sometimes refer to a {tl, Q, QQ, P) data release 
algorithm as an algorithm for releasing the class of queries Q over QQ. 

This work focuses on differentially private data release algorithms, i.e. data release algorithms which 
are e-diff"erentially private as per Definition 2. 1 (note that such algorithms must be randomized). In such 
data release algorithms, the probability of any output synopsis S differs by at most an e^ multiplicative factor 
between any two adjacent databases. 

We note two cases of particular interest. The first is when QQ is the set of all distributions over Q. In this 
case, we say that J?l is a distribution-free data release algorithm. For such algorithms it is possible to apply 
the "boosting for queries" results of [DRV 10] and obtain a data release algorithm whose synopsis is (w.h.p.) 
accurate on all queries (i.e. with y = 0). We note that those boosting results apply only to data release 
algorithms that access their distribution by sampling (i.e. they need not hold for data release algorithms that 
use evaluation access). 

A second case of interest is when QQ contains only a single distribution, the uniform distribution over 
all queries Q. In this case both sampling and evaluation access are easy to simulate. 

Remark 3.4. Throughout this work, we fix the accuracy parameter a, and lower-bound the required database 
size n needed to ensure the (additive) approximation error is at most a. An alternative approach taken in 
some of the differential privacy literature, is fixing the database size n and upper bounding the approxima- 
tion error a as a function ofn (and of the other parameters). Our database size bounds can be converted to 
error bounds in the natural way. 



'Note that, generally speaking, sampling and evaluation access are incomparably powerful (see [KMR*94, Nao96]). In this 
work, however, whenever we give an algorithm evaluation access we will also give it sampling access. 



Definition 3.5 (Learning Thresholds). Let Qbea set (which we now view as a domain of potential unlabeled 
examples) and let QQ be a set of distributions on <3. Let !F be a set of predicates on Q, i.e. functions 
Q -^ {0, 1|. Given t e [0, 1], let Tn,t be the set of all threshold functions of the form / = I {^ Z -Li fi > ?} 
where fi ^T for all 1 < / < n. We refer to functions in T^j as n-thresholds over T. Let X be a (probabilistic) 
algorithm that gets sampling access to labeled examples by a distribution G e QQ and a target function 
/ e Tn,!- -C takes as input accuracy parameters y,yS > 0, an integer n > Q, and a threshold t e [0, 1]. X 
outputs a boolean hypothesis h : Q ^ {0, 1). 

We say that X is an {y,/3)-learning algorithm for thresholds over (<3, 0Q, !F) if for every y,/? > 0, every 
n, every t e [0, 1], every / e f„,t and every G e 0Q, we have 



P 

h<-£{n,t,y,/i) 



P\Kq)*fiq)]>7 

q~G 



<I3. (2) 

The definition is analogous for all other notions of oracle access (see e.g. Definition 3.6 below). 



3.2 Statement of the main theorem 

In this section we formally state our main theorem, which establishes a general reduction from private 
data release to learning certain threshold functions. The next definition captures a notion of oracle access 
for learning algorithms which arises in the reduction. The definition combines sampling access to labeled 
examples with a limited kind of evaluation access to the underlying distribution and black-box oracle access 
to the target function /. 

Definition 3.6 (approximate distribution-restricted oracle access). Let G be a distribution over a domain Q, 
and let / be a function whose domain is Q. When we say that an algorithm ^ has approximate G-restricted 
evaluation access to f, we mean that 

1. Jl has sampling access to labeled examples by (G, /); and 

2. Jl can make oracle queries on any q e Q, which are answered as follows: there is a fixed constant 
c e [1/3,3] such that (?) if G[q] = the answer is (0, -L); and (n) if G[q] > the answer is a pair 
{c-G[q],f{q)). 

Remark 3.7. We remark that this is the type of of oracle access provided to the learning algorithm in 
our reduction. This is different from the oracle access that the data release algorithm has. We could extend 
Definition 3.3 to refer to approximate evaluation access to G; all our results on data release using evaluation 
access would extend to this weaker access (under appropriate approximation guarantees). For simplicity, 
we focus on the case where the data release algorithm has perfectly accurate evaluation access, since this is 
sufficient throughout for our purpose. 

One might initially hope that privately releasing a class of queries <3 over some set of distributions QQ 
reduces to learning corresponding threshold functions over the same set of distributions. However, our 
reduction will need a learning algorithm that works for a potentially larger set of distributions QQ' 2 QQ. 
(We will see in Theorem 3.9 that this poses a stronger requirement on the learning algorithm.) Specifically, 
0Q' will be a smooth extension of ^Q as defined next. 

Definition 3.8 (smooth extensions). Given a distribution G over a set Q and a value /i ^ 1 , the p-smooth 
extension ofG is the set of all distributions G' which are such that G'{q\ < // • G[^] for all q € Q. Given a 
set of distributions 0Q and p > I, the p-smooth extension ofQQ, denoted QQ' , is defined as the set of all 
distributions that are a //-smooth extension of some G e QQ. 



With these two definitions at hand, we can state our reduction in its most general form. We will com- 
bine this general reduction with specific learning results to obtain concrete new data release algorithms in 
Sections 5 and 6. 

Theorem 3.9 (Main Result: Private Data Release via Learning Thresholds). Let 14 be a data universe, Q a 
set of query descriptions, QQ a set of distributions over Q, and P: (3 X 1/ — > |0, 1 ) a predicate. 

Then, there is an e-differentially private {a,fi,y)-accurate data-release algorithm for databases of size 
n provided that 

— there is an algorithm X. that (y,li)-leams thresholds over {Q,QQ.' ,{pu: u € 1/)), running in time 
t{n,y,l3) and using b{n,y,P) queries to an approximate distribution-restricted evaluation oracle for 
the target n-threshold function, where @Q' is the {2/y)-smooth extension of@Q; and 

- we have 

C-K«',r',y6')-log(^^^)-log(l//3') 

n > , (3) 

ea^y 

where n' = ©(log \Q\/a^), P' = Qifia), y' = @{ya) and C > is a sufficiently large constant. 

The running time of the data release algorithm is poly {t{n',y',/3'), n, \/a, log(l/y6), l/y). 

The next remark points out two simple modifications of this theorem. 

Remark 3.10. 1. We can improve the dependence on n in (3) by a factor of&(^/a) in the case where the 
learning algorithm £. only uses sampling access to labeled examples. In this case the data release 
algorithm also uses only sampling access to the query distribution G. The precise statement is given 
in Theorem 4.10 which we present after the proof of Theorem 3.9. 

2. A similar theorem holds for (e, 6)-differential privacy, where the requirement on n in (3) is improved 
to a requirement on -y/n up to a log{l / 5) factor The proof is the same, except for a different (but 
standard) privacy argument, e.g., using the Composition Theorem in [DRVIO]. 

3.3 Informal proof overview 

Our goal in the data release setting is approximating the query answers {f^iq)}qeQ- This is exactly the task 
of approximating or learning a sum of n predicates from the set T = {pu '■ u € 1i]. Indeed, each item u in 
the database specifies a predicate pu, and for a fixed query ^ € 2 we are trying to approximate the sum of 
the predicates f^{q) = r4 • Y,ueD Pui.<f)- We want to approximate such a sum in a privacy -preserving manner, 
and so we will only permit limited access to the function /^ that we try to approximate. In particular, we 
will only allow a bounded number of noisy oracle queries to this function. Using standard techniques (i.e. 
adding appropriately scaled Laplace noise [DMNS06]), an approximation obtained from a bounded number 
of noisy oracle queries will be differentially private. It remains, then, to tackle the task of (i) learning a sum 
of n predicates from T using an oracle to the sum, and (ii) doing so using only a bounded (smaller than n) 
number of oracle queries when we are provided noisy answers. 

From Sums to Thresholds. Ignoring privacy concerns, it is straightforward to reduce the task of leai^ning 
a sum /^ of predicates (given an oracle for /^) to the task of learning thresholded sums of predicates (again 
given an oracle for /^). Indeed, set k = IS/al and consider the thresholds t\, . . . ,t]^ given by tj - i/{k + 1). 
Now, given an oracle for /^, it is easy to simulate an oracle for //^ for any ?,. Thus, we can learn each of 
the threshold functions /,^ to accuracy 1 - y/k with respect to G. Call the resulting hypotheses h\,. . .,hk. 
Each hi labels a ( 1 - 'y/^)-fraction of the queries/examples in Q correctly w.r.t the threshold function f^. We 
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can produce an aggregated hypothesis h for approximating f^ as follows: given a query/example q, let h(q) 
equal tj where tj is the smallest / such that hi{q) = and /?;+i(g) = 1. For random q ~ G, we will then have 
IK^l) - f^{4)\ ^ '^/3 with probability 1 - y (over the choice of q). 

Thus, we have reduced learning a sum to learning thiesholded sums (where in both cases the learning 
is done with an oracle for the sum). But because of privacy considerations, we must address the challenges 
mentioned above: (/) learning a thresholded sum of n predicates using few (less than n) oracle queries to the 
sum, and (//) learning when the oracle for the sum can return noisy answers. In particular, the noisy sum 
answers can induce eiTors on threshold oracle queries (when the sum is close to the threshold). 

Restricting to Large Margins. Let us say that a query/example q € Q has low margin with respect to 
f^ and tj if \f^{q) - ?,| < all . A useful observation is that in the argument sketched above, we do not 
need to approximate each threshold function ff well on low margin elements q. Indeed, suppose that each 
hypothesis /i, cits ai^bitraiily on a set £, c Q that contains only inputs that have low mai^gin w.r.t. f^ and 
ti, but achieves high accuracy 1 - ylk with respect to G conditioned on the event Q\ £{. Then the above 
aggregated hypothesis h would still have high accuracy with high probability over q ~ G; more precisely, h 
would satisfy \h{q) - f^{q)\ < 2a/3 with probability I -yior q ~ G. 

The reason is that for every q € Q, there can only be one threshold /* e { 1, . . . , ^} such that \f^iq) - f,* I < 
a /I (since any two thi^esholds are a/3- apart from each other). While the threshold hypothesis h^ might err 
on q (because q has low margin w.r.t. tj'), the hypotheses /j,»-i and /1/.+1 should still be accurate (w.h.p. over 
q ~ G), and thus the aggregated hypothesis h will still output a value between ?;*_i and ?;*+i. 

Threshold Access to The Data Set. We will use the above observation to our advantage. Specifically, we 
restrict all access to the function /^ to what we call a threshold oracle. Roughly speaking, the thi^eshold 
oracle (which we denote TO and define formally in Section 4. 1) works as follows: when given a query q and 
a threshold t, it draws a suitably scaled Laplacian variable N (used to ensure differential privacy) and returns 
1 if /^(^)+A^ > t+a/20; returns if /^(^)+Af < t-a/20; and returns "±" iit-a/20 < f^{q)+N < t+a/20. 
If D is large enough then we can ensure that |A'^| < a/40 with high probability, and thus whenever the oracle 
outputs X on a query q we know that q has low margin with respect to f^ and t (since 0-/20 + |A'^| < a/7). 

We will run the learning algorithm £, on examples generated using the oracle TO after removing all 
examples for which the oracle returned J.. Since we ai^e conditioning on the TO oracle not returning J., 
this transforms the distribution G into a conditional distribution which we denote G'. Since we have only 
conditioned on removing low-margin ^'s, the argument sketched above applies. That is, the hypothesis that 
has high accuracy with respect to this conditional distribution G' is still useful for us. 

So the threshold oracle lets us use noisy sum answers (allowing the addition of noise and differential 
privacy), but in fact it also addresses the second challenge of reducing the query complexity of the learning 
algorithm. This is described next. 

Savings in Query Complexity via Subsampling. The remaining challenge is that the threshold oracle 
can be invoked only (at most) n times before we exceed our "privacy budget". This is problematic, because 
the query complexity of the underlying learning algorithm may well depend on n, since /^ is a sum of n 
predicates. To reduce the number of oracle queries that need to be made, we observe that the sum of n 
predicates that we are trying to learn can actually be approximated by a sum of fewer predicates. In fact, 
there exists a sum f^ of n' = 0(log \Q\/a^) predicates from T that is or/lOO-close to /^ on all inputs in Q, 
i.e. If^iq) - f^ (q)\ < a/ 100 for all q e Q. (The proof is by a subsampling argument, as in [BLR08]; see 
Section 4.1.) We will aim to leain this "smaller" sum. The hope is that the query complexity for leai^ning 
/^ may be considerably smaller, namely scaling with n' rather than n. Notice, however, that learning a 
threshold of /^ requires a threshold oracle to /^ , rather than the threshold oracle we have, which is to f^. 
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Our goal, then, is to use the threshold oracle to /^ to simulate a threshold oracle to /^ . This will give us 
"the best of both worlds": we can make (roughly) 0{n) oracle queries thus preserving differential privacy, 
while using a learning algorithm that is allowed to have query complexity superlinear in n'. 

The key observation showing that this is indeed possible is that the threshold oracle TO akeady "avoids" 
low-mai^gin queries where ff and fP might disagree! Whenever the threshold oracle TO (w.r.t. D) answers 
/ ^ J- on a query q„ we must have \f^(q) - t\ ^ a/20 - N > a/ 100, and thus ff^iq) = fP {q). Moreover, it 
is still the case that TO only answers J. on queries q that have low margins w.r.t fP . This means that, as 
above, we can run X using TO (w.r.t. D) in order to learn f^ . The query complexity depends on n' and is 
therefore independent of n. At the same time, we continue to answer all queries using the threshold oracle 
with respect to /^ so that our privacy budget remains on the order \D\ = n. Denoting the query complexity 
of the learning algorithm by b{n') we only need that n » b{n'). This allows us to use learning algorithms 
that have b{n') » n' as is usually the case. 

Sampling from the conditional distribution. In the exposition above we glossed over one technical detail, 
which is that the learning algorithm requires sampling (or distribution restricted) access to the distribution 
G' over queries q on which TO does not return ±, whereas the data release algorithm we are trying to build 
only has access to the original distribution G. We reconcile this disparity as follows. 

For a threshold t, let ^, denote the probability that the oracle TO does not return ± when given a random 
q ~ G and the threshold t. There are two cases depending on ^y: 

^t < y: This means that the threshold t is such that with probability 1 - y a random sample q ~ G has low 
margin with respect to /^ and t. In this case, by simply outputting the constant-? function as our 
approximation for f^, we get a hypothesis that has accuracy a/3 with probability 1 - y over random 
q ~ G. 

^t > y: In this case, the conditional distribution G' induced by the threshold oracle is 1/y-smooth w.r.t. 
G. In particular, G' is contained in the smooth extension QQl for which the learning algorithm is 
guaranteed to work (by the conditions of Theorem 3.9). This means that it we can sample from G' 
using rejection sampling to G. It suffices to oversample by a factor of 0(1 /y) to make sure that we 
receive enough examples that are not rejected by the threshold oracle. 

Finally using a reasonably accurate estimate of [,, we can also implement the distribution restricted approx- 
imate oracle access that may be required by the learning algorithm. We omit the details from this informal 
overview. 

4 Proof of Theorem 3.9 

In this section, we give a formal proof of Theorem 3.9. We formalize and analyze the threshold oracle first. 
Then we proceed to our main reduction. 

4.1 Threshold access and subsampUng 

We begin by describing the threshold oracle that we use to access the function /^ throughout our reduction; 
it is presented in Figure 1. The oracle has two purposes. One is to ensure differential privacy by adding 
noise every time we access /^. The other purpose is to "filter out" queries that ai^e too close to the given 
threshold. This will enable us to argue that the threshold oracle for ff agrees with the function f^ where 
D' is a small subsample of D. 

Throughout the remainder of this section we fix all input parameters to our oracle, i.e. the data set D and 
the values h,a>0. We let/3 > denote the desired error probability of our algorithm. 
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Input: data set D of size n, tolerance a > Q, query bound Zj e N. 
Threshold Oracle TO{D, a, b): 

- When invoked on the 7-th query {q, t) e Qx [0, 1), do the following: 

- If 7 > b, output ± and terminate. 

- If iq,t') has not been asked before for any threshold t' , sample a fresh Laplacian variable 
A'^^ ~ hapib/en) and put A^ = f^{q) + Nq. Otherwise reuse the previously created value Aq. 



Output 



if Aq^t -2a 13, 

1 {fAq>t + 2al2>, 
± otherwise. 



Figure 1: Threshold oracle for /^. This threshold oracle is the only way in which the data release algorithm ever 
interacts with the data set D. Its purpose is to ensure privacy and to reject queries that are too close to a given threshold. 

Lemma 4.1. Call two queries (q, t), (q' ,t') distinct ifq + q' ■ Then, the threshold oracle TO{D, a, b) answers 
any sequence ofb distinct adaptive queries to f^ with s-differential privacy. 

Proof. This follows directly from the guarantees of the Laplacian mechanism as shown in [DMNS06]. ■ 

Our goal is to use the threshold oracle for fP to correctly answer queries to the function fP where D' 
is a smaller (sub-sampled) database that gives "close" answers to D on all queries q e Q. The next lemma 
shows that there always exists such a smaller database. 

Lemma 4.2. For any a > 0, there is a database D' of size 

IDU'-^^ (4) 

such that 



a^ 



max|/^(^)-/^'(^)|<a. 

Proof The existence of D' follows from a subsampling argument as shown in [BLR08]. ■ 

The next lemma states the two main properties of the threshold oracle that we need. To state them more 
succinctly, let us denote by 

Q{t,a) = [q^Q:\f''{q)-t\>a} 

the set of elements in Q that are ff-far from the threshold t. 
Lemma 4.3 (Agreement). Suppose D satisfies 

Ea 

Then, there is a data set D' of size \D'\ < 90 • a~- log \Q\ and an event T (only depending on the choice of the 
Laplacian variables) such that Y has probability 1-/3 and ifT occurs, then TO{D,a,b) has the following 
guarantee: whenever TO{D, a, b) outputs I on one of the queries {q, t) in the sequence, then 



1. ifli=± then I = ff' (q) = ff'iq) , and 
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2. if I = ± then q ^ Q{t, a) . 
Proof. Let D' be the data set given by Lemma 4.2 with its "a" value set to a/3 so that 

V'^iq) - f'^'iqi < a/3 

for every input q e Q. 

The event Y is defined as the event that every Laplacian variable A^^ sampled by the oracle has magnitude 
Wq\ < "h- Under the given assumption on \D\ in 5 and using basic tail bounds for the Laplacian distribution, 
this happens with probability 1-/3. 

Assuming T occurs, the following two statements hold: 

1. Whenever the oracle outputs / ?t _l on a query {q, t), then we must have either f^{q) + Nq- t ^ 2q'/3 
(and thus both /^(g) > t + «/3 and f^'{q) > t) or else /°(g) + Nq - t < -2al3> (and thus both 
f^{q) <t-o'j'i and f^ (q) < t). This proves the first claim of the lemma. 

2. Whenever q e Q{t, a), then 1/^(17) + Nq - t\^ 2a 13, and therefore the oracle does not output J.. This 
proves the second claim of the lemma. 



4.2 Privacy-preserving reduction 

In this section we describe how to convert a non-private learning algorithm for threshold functions of the 
form fP to a privacy-preserving learning algorithm for functions of the form f^. The reduction is presented 
in Figure 2. We call the algorithm PrivLearn. 

Setting of parameters. In the description of PrivLearn we use the following setting of parameters: 



4410 • log \Q\ 

k 



a2 



"^=1 ^=Tk ^^^ 



Obase =b{n ,y ,13) busr = Wtotal ^ 2k ■ biter (7) 

r 

Analysis of the reduction. Throughout the analysis of the algorithm we keep all input parameters fixed 
so as to satisfy the assumptions of Theorem 3.9. Specifically we will need 

210-Val-log(10Val/y6) ^^^ 

\Lf\ ^ . (8j 

sa 

We have made no attempt to optimize various constants throughout. 

Lemma 4.4 (Privacy). Algorithm PrivLearn satisfies s-dijferential privacy. 

Proof. In each iteration of the loop in Step 3 the algorithm makes at most 2Z?iter queries to TO (there are 
^iter calls made on the samples and at most ^base =^ ^iter evaluation queries). But note that TO is instantiated 
with a query bound of Zjtotai - ^.kbiter- Hence, it follows from Lemma 4.1 that TO satisfies e-differential 
privacy. Since TO is the only way in which PrivLearn ever interacts with the data set, PrivLearn satisfies 
^-differential privacy. ■ 

We now prove that the hypothesis produced by the algorithm is indeed accurate, as formalized by the 
following lemma. 
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Input: Distribution G e QQ, data set D of size n, accuracy parameters a,p,y > 0; learning algorithm X. 

for thresholds over {Q, QQ, T) as in Theorem 3.9 requiring h{n' , y' -.fi') labeled examples and approximate 

restricted evaluation access to the target function. 

Parameters: See (6) and (7). 

Algorithm PrivLearn for privately learning /^: 

1. Let TO denote an instantiation of TOiD, "ji, Z^totai)- 

2. Sample Vr points {qi}i^i^b,^„ from G. 

3. For each iteration / e {\,. . .,k} : 

(a) Let?,- = 'lk+\. 

(b) For each qj, j e [^iter] send the query {qj, tj) to TO and let Jj denote the answer. Let B, = 
{]■ Ij * i}. 

(c) If ^ < ^, output the constant f, function as hypothesis h and terminate the algorithm. 

(d) Run the learning algorithm £.{n' ,ti,y' ,p') on the labeled examples [(qj,lj)}j(zB., answering 
evaluation queries from £, as follows: 

- Given a query q posed by £,, let / be the answer of TO on (q, ti). 

- If / = ±, then output (0, ±). Otherwise, output {G[q] • tbT' ■ 

(e) Let hi denote the resulting hypothesis. 



4. Having obtained hypotheses h\,. . .,hk, the final hypothesis h is defined as follows: h{q) equals the 
smallest / € [k] such that hi{q) = 1 and hi-\{q) = (we take ho{q) = and hk+iiq) = 1). 



Figure 2: Reduction from private data release to learning thresholds (non-privately). 

Lemma 4.5 (Accuracy). With overall probability 1 - yS, the hypothesis h returned by PrivLearn satisfies 

Pj\hiq)-f%)\^a]>l-y. (9) 

Proof. We consider three possible cases: 

1. The first case is that there exists at e |fi, . . . , f^} such that distribution G has at least 1 - y/ 10 of its 
mass on points that are tt-close to t. In this case a Chemoff bound and the choice of buer » ^base 
imply that with probability 1-/3 the algorithm terminates prematurely and the resulting hypothesis 
satisfies (9). 

2. In the second case, there exists at e {t\,. . .Jt) such that the probability mass G puts on points that are 
a-close to t is between 1 - y and 1 - y/ 10. In this case if the algorithm terminates prematurely then (9) 
is satisfied; below we analyze what happens assuming the algorithm does not terminate prematurely. 

3. In the third case every t e {?i ,...,?/;) is such that G puts less than 1 - y of its mass on points a-close to 
t. In this third case if the algorithm terminates prematurely then (9) will not hold; however, our choice 
of biier implies that in this third case the algorithm terminates prematurely with probability at most 
1-/3. As in the second case, below we will analyze what happens assuming the algorithm does not 
terminate prematurely. 

Thus in the remainder of the argument we may assume without loss of generality that the algorithm does 
not terminate prematurely, i.e. it produces a full sequence of hypotheses h\,...,hk. Furthermore, we can 
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assume that the distribution G places at most 1 - y/10 fraction of its weight near any particular threshold f,. 
This leads to the following claim, showing that in all iterations, the number of labeled examples in Bj is 
large enough to run the learning algorithm. 



Claim 4.6. P{V/: |B,| > Z^basel > 1 



3 • 



Proof. By our assumption, the probability that a sample q ~ G is rejected at step t of PrivLearn is at 
most y/io. By the choice of Z?iter it follows that |B,| ^ Z^base with probability 1 - /i/k. Taking a union bound 
over all thresholds t completes the proof. ■ 

The proof strategy from here on is to first analyze the algorithm on the conditional distribution that is 
induced by the threshold oracle. We will then pass from this conditional distribution to the actual distribution 
that we are interested in, namely, G. 

We chose \D\ large enough so that we can apply Lemma 4.3 to TO with the "(^''-setting of Lemma 4.3 
set to »/?. Let D' be the data set and F be the event given in the conclusion of Lemma 4.3 applied to TO. 
(Note that n' ^ |D'| < 7^ • 90a'^ log \Q\ as stated above.) 

By the choice of our parameters, we have 

pin^i-^. (10) 

Here the probability is computed only over the internal randomness of the threshold oracle TO which we 
denote by R. Fix the randomness R of TO such that R e F. For the sake of analysis, we can think of the 
randomness of the oracle as a collection of independent random variables (A^^)^e(3 (where A^^ is used to 
answer all queries of the form (q, ?'))■ In particular, the behavior of the oracle would not change if we were 
to sample all variables (Nq)qeQ up front. When we fix R we thus mean that we fix A^^ for all q e Q. 

We may therefore assume for the remainder of the analysis that TO satisfies properties (1) and (2) of 
Lemma 4.3. 

Let us denote by Qi c <3 the set of examples for which TO would not answer ± in Step 3 at the /-th 
iteration of the algorithm. Note that this is a well-defined set since we fixed the randomness of the oracle. 
Denote by G, the distribution G conditioned on Qi. Further, let Z, = Pq^c {<? ^ Qi) ■ Observe that 

(G[q]/Zi qeQi 
Gi[q]^\^ ■ (11) 

10 o.w. 

The next lemma shows that PrivLearn answers evaluation queries with the desired multiplicative precision. 
Lemma 4.7. With probability 1 - /^/6k (over the randomness of PrivLearn), we have 

|<]^<3Z,. (12) 

Proof. The lemma follows from a Chernofli' bound with the fact that we chose ^iter » ^base- ■ 

Assuming that (12) holds, we can argue that the learning algorithm in step t produces a "good" hypoth- 
esis as expressed in the next lemma. 

Lemma 4.8. Let t e [t\,. . . ,tk\- Conditioned on (12), we have that with probability 1 —Pj^k (over the internal 
randomness of the learning algorithm invoked at step i) the hypothesis hi satisfies 
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Proof. This follows directly from the guarantee of the learning algorithm X once we argue that (with the 
claimed probability): 

1. Each example q is sampled from G,- and labeled con^ectly by f^ (q) and /^^ (q) = f^{q). 

2. All evaluation queries asked by the learning algorithm are answered with the multiplicative enor 
allowed in Definition 3.6. 

3. The algorithm received sufficiently many, i.e., Z>base> labeled examples. 

The first claim follows from the definition of G,, since we can sample from G, by sampling from G and 
rejecting if the oracle TO returns ±. Since F is assumed to hold, we can invoke property (1) of Lemma 4.3 
to conclude that whenever the oracle does not return X, then its answer agrees with ff (q) and moreover 

fu\q) = fuiq)- 

To see the second claim, consider an evaluation query q. We consider two cases. The first case is where 
the threshold oracle returns ± and PrivLearn outputs (0, 1). Note that in this case indeed G; puts weight 
on the query q. In the second case PrivLearn outputs {G[q\ ■ ^itei7|5rl, 0- By (1 1) and since we assumed F 
holds, the output satisfies the desired multiplicative bound. 

The third claim is a direct consequence of Claim 4.6. ■ 

We conclude from the above that with probability 1 -/?/3 (over the combined randomness of PrivLearn 
and of the learning algorithm), simultaneously for all / € [k] we have 

Pjhiiq) + fl^iq) q\ = P ilviq) i= f'^iq)} ^^. (13) 

q~G I ) q~Gi >■ ' ' k 

This follows from a union bound over all k applications of Lemma 4.7 and Lemma 4.8. 

We can now complete the proof of Lemma 4.5. That is, we will show that assuming (13) the hypothesis 
h satisfies 

¥^[\h{q)-f{q)\^a]>\-y. 

Note that 

1. (13) occurs with probability 1 -/^/3, 

2. our assumption on the threshold oracle, i.e., /? e F also occurs with probability 1 - PI?, (over the 
randomness of the oracle) 

3. the event in Claim 4.6 holds with probability 1 - /3/3. 

Hence all three events occur simultaneously with probability 1-/3 which is what we claimed. We proceed 
by assuming that all thi^ee events occurred. In the following, let 

Eni = [qeQ:hi{q)^f^{q)] 

denote the set of points on which hi errs. We will need the following claim. 
Claim 4.9. Let q e Q. Then, 

\h{q) - f{q)\ >a =^ ^ € ^ Err,- n Qi . 

ie[k] 
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Proof. Arguing in the contrapositive, suppose q ^ \Jie[k\ Eit,- n Qt. This means that for all / e \k\ we have 
that either q i Err, or ^ ^ 2,. 

However, we claim that there can be at most one / e \k'\ such that q i Qi meaning that q is rejected 
at step /. This follows from property (2) of Lemma 4.3 which asserts that if ^ ^ Q,-, then we must have 
If^iq) - til < Q'/V, and the fact that any two thresholds differ by at least a/3. 

Hence, under the assumption above it must be the case that q i Err, for all but at most one / e [^]. 
This means that all but one hypothesis hi correctly classify q. Since the thresholds are spaced a/3 apart, this 
means the hypothesis h has en^or at most lajl <,a oxvq. ■ 



With the previous claim, we can finish the proof. Indeed, 



\^h{q) - f{q)\ > a) < P ^ En"/ n Q 



q~G 



'•era 



< y p {EiT 


i n Qi) 




k 


Err,- 1 Qi) 


i~G 


k 








EiT,- 1 Qi} 




I— I 







J- 



This concludes the proof of Lemma 4.5 



(using Claim 4.9) 
(union bound) 



(using (13)) 



Lemma 4.4 (Privacy) together with Lemma 4.5 (Accuracy) conclude the proof of out main theorem. 
Theorem 3.9. 



4.3 Quantitative Improvements without Membership Queries 

Here we show how to shave off a factor of l/a in the requirement on the data set size n in Theorem 3.9. This 
is possible if the learning algorithm uses only sampling access to labeled examples. 

Theorem 4.10. Let 11 he a data universe, Q a set of query descriptions, QQ a set of distrihutions over Q, 
and P: Qxtl —> {0,1} a predicate. 

Then, there is an e- differentially private {a,/^, y)-accurate data-release algorithm provided that there is 
an algorithm £, that (■y,(i)-learns thresholds over {Q,QQ' , {pi, : u € 'ZY)) using b{n,y,l3) random examples; 
and we have 

C ■ bin',y,P') ■ log(^^i^) • log(l/;S') 



n^ 



say 



where n' = ©(log \Q\/a'^),/^' = &(fia), y' = &{ya) and C > is a sufficiently large constant. If £, runs in 
time t{n, y,P) then the data release algorithm runs in time poly(?(?i', y',/3'), n, \/a, log(l/yS), l/y). 

Proof. The proof of this theorem is identical to that of Theorem 3.9 except that we put ^totai = 'Obiter rather 
than Ikbiter- It is easy to check that the algorithm indeed makes only Z^totai distinct queries (in the sense of 
Lemma 4.1) to the threshold oracle so that privacy remains ensured. The correctness argument is identical. 



5 First Application: Data Release for Conjunctions 

With Theorems 3.9 and 4.10 in hand, we can obtain new data release algorithms "automatically" from 
learning algorithms that satisfy the properties required by the theorem. In this section we present such data 
release algorithms for conjunction counting queries using leai^ning algorithms (which require only random 
examples and work under any distribution) based on polynomial threshold functions. 

Throughout this section we fix the query class under consideration to be monotone conjunctions, i.e. we 
take^l = Q = {0, 1)^ and Piq, u) = I - Vn „,=o qi- 

The learning results given later in this section, together with Theorem 4.10, immediately yield: 

Theorem 5.1 (Releasing conjunction counting queries). 1. There is an e-dijferentially private algorithm 
for releasing the class of monotone Boolean conjunction queries over QQ - {all probability distri- 
butions over Q\ which is {a,/3,y)-accurate and has runtime poly (n) for databases of size n provided 
that 



n>d \ '"' I O 



eay^ 



2. There is an e-differentially private algorithm for releasing the class of monotone Boolean conjunction 

queries overQQj^ - {all probability distributions over Q supported on Bk = {q ^Q'. q\-\ vq^i < k}} 

which is ia,/3, y)-accurate and has runtime poly (n) /or databases of size n provided that 



■oQM^)) (^/ log(V/^)- 



\ say^ 

These algorithms are distribution-free, and so we can apply the boosting machinery of [DRV 10] to get 
accurate answers to all of the /:-way conjunctions with similar database size bounds. See the discussion and 
Corollary 1.3 in the introduction. 

In Section 5.1 we establish structural results showing that certain types of thresholded real- valued func- 
tions can be expressed as low-degree polynomial threshold functions. In Section 5.2 we state some learning 
results (for learning under arbitrary distributions) that follow from these representational results. Theo- 
rem 5.1 above follows immediately from combining the learning results of Section 5.2 with Theorem 4.10. 

5.1 Polynomial threshold function representations 

Definition 5.2. Let X c (3 = {0, 1)^ and let / be a Boolean function f : X ^ {0, 1). We say that / has a 
polynomial threshold function (PTF) of degree a over X if there is a real polynomial A(q\ , . . . , q^) of degree 
a such that 

f(q) = sign{A{q)) for all ^ e X 

where the sign function is sign{z) = 1 if z ^ 0, sign(z) = if z < 0. 

Note that the polynomial A may be assumed without loss of generality to be multilinear since X is a 
subset of {0,1 1'^ 

5.1.1 Low-degree PTFs over sparse inputs 

Let Bk c {0, 1)'' denote the collection of all points with Hamming weight at most k, i.e. B^ = {q e {0, l}'' : 
qi + ■ ■ ■ + qd <: k}. The main result of this subsection is a proof that for any t e [0, 1] the function ff has a 
low-degree polynomial threshold function over B^. 

Lemma 5.3. Fix t e [0, 1]. For any database D of size n, the function ff has a polynomial threshold function 
of degree O ( ^Jk\ogn\ over the domain B^. 
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To prove Lemma 5.3 we will use the following claim: 

Claim 5.4. Fix k > to be a positive integer and s > 0. There is a univariate polynomial s of degree 
O i yHog(Ve) j which is such that 

1. s(k) = 1; and 

2- 1^0)1 < sfor all integers < 7 < ^ - 1. 

Proof. This claim was proved by Buhrman et al. [BCdWZ99], who gave a quantum algorithm which implies 
the existence of the claimed polynomial (see also Section 1.2 of [She09]). Here we give a self-contained 
construction of a polynomial s with the claimed properties that satisfies the slightly weaker degree bound 
deg{s) - 0{ V^log(l/e)). We will use the univariate Chebyshev polynomial C, of degree r - \ ^/ic]. Consider 
the polynomial 

,xriog(l/e)l 



sU) = 



. c,.(i + i) J 



(14) 



It is clear- that if 7 = ^ then s{j) = 1 as desired, so suppose that j is an integer < 7 < ^ - 1. This implies 
that (j/k){l + l/k) < 1. Now well-known properties of the Chebychev polynomial (see e.g. [Che66]) imply 
that \CriiJ/k)il + l/k))\ < 1 and Cril + l/k)^ 2. This gives the 0( V^log(l/e)) degree bound. ■ 

Recall that the predicate function for a data item m e {0, 1 j'^ is denoted by 



Puiq) = 1 - Y qi- 

i: Ui=0 



As an easy corollary of Claim 5.4 we get: 



Corollary 5.5. Fix e > 0. For every m € {0, 1 Y', there is a d-variable polynomial A„ of degree O ( yHogO/e) j 
which is such that for every q e B^, 

1- If Puiq) = 1 then Au(q) = 1; 

2. If Puiq) - then \Au{q)\ < s. 

Proof. Consider the linear function L{q) = k - Yji-. ui=oqi- For q ^ B^ v/e have that L(q) is an integer in 
{0, . . . , k], and we have L{q) = kii and only if Puiq) = 1. The desired polynomial is A„(^) - s{L{q)). ■ 

Proof of Lemma 5.3. Consider the polynomial 



A{q) = Y,AM 
ueD 



where for each data item u, r„ is the polynomial from Corollary 5.5 with its "e" parameter set to e = \l{3n). 
We will show that A{q) - {\tn'\ - 1/2) is the desired polynomial which gives a PTF for fP over B^. 

First, consider any fixed q e B^ for which f^iq) = 1. Such a q must satisfy f^iq) = j/n > t for some 
integer j, and hence j > [?«]. Corollary 5.5 now gives that A(^) > [?«] - 1/3. 

Next, consider any fixed q & B^ for which ffiq) = 0. Such a q must satisfy f^(q) = j/n < t for some 
integer j, and hence j < Itnl - 1. Corollary 5.5 now gives that A{q) < ftn'] - 2/3. This proves the lemma. ■ 
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5.1.2 Low-degree PTFs over the entire hypercube 

Taking k = d in the previous subsection, the resuhs there imply that ff can be represented by a polynomial 
threshold function of degree 0{ ^/Jlogn) over the entire Boolean hypercube {0, l}''. In this section we im- 
prove the degree to 0{d'^^{\ogn)^/^). This result is very similar to Theorem 8 of [KOS04] (which is closely 
based on the main construction and result of [KS04]) but with a few differences: first, we use Claim 5.4 to 
obtain slightly improved bounds. Second, we need to use the following notion in place of the notion of the 
"size of a conjunction" that was used in the earlier results: 

Definition 5.6. The width of a data item m e D is defined as the number of coordinates / such that ui = 0. 
The width of D is defined as the maximum width of any data item ii e D. 

We use the following lemma: 

Lemma 5.7. Fix any t € [0, 1] and suppose that n-element database D has width w. Then ff has a polyno- 
mial threshold function of degree 0( ^jw\ogn\ over the domain {0, 1)'^'. 

Proof. The proof follows the constructions and arguments of the previous subsection, but with "«;" in place 
of "k" throughout (in particular the linear function L{q) is now defined to be L{q) = w -YjI-. 11^=0 qd- ■ 

Lemma 5.8. Fix any value r € {I, . . .,d]. The function ffiqi, . . . , qd) can be expressed as a decision tree T 
in which 

1. each internal node of the tree contains a variable qt; 

2. each leaf of T contains a function of the form ff where D' Q D has width at most r; 

3. the tree T has rank at most {2d/r) Inn + 1 . 

Proof sketch. The result follows directly from the proof of Lemma 10 in [KS04], except that we use the 
notion of width from Definition 5.6 in place of the notion of the size of a conjunction that is used in [KS04]. 
To see that this works, observe that since Puiq) - 1 - V,-: i,-=oqi, fixing qi = 1 will fix all predicates /?„ with 
Ui = to be zero. Thus the analysis of [KS04] goes through unchanged, replacing "terms of / that have size 
at least r" with "data items in D that have width at least r" throughout. ■ 

Lemma 5.9. The function ff can be represented as a polynomial threshold function of degree 0(d'^^(log nfl^). 

Proof. The proof is nearly identical to the proof of Theorem 2 in [KS04] but with a few small changes. We 
take r in Lemma 5.8 to be d^^^iXog n)'^^ and now apply Lemma 5.7 to each width-r database D' at a leaf of 
the resulting decision tree. Arguing precisely as in Theorem 2 of [KS04] we get that ff has a polynomial 
threshold function of degree 

max {^ In n + 1 , O ( ^jrlogn)] = O ( V^logn) = O (d''\\og nf'^) . 



5.2 Learning thresholds of conjunction queries under arbitrary distributions 

It is well known that using learning algorithms based on polynomial-time linear programming, having low- 
degree PTFs for a class of functions implies efficient PAC learning algorithms for that class under any 
distribution using random examples only (see e.g. [KS04, HS07]). Thus the representational results of 
Section 5.1 immediately give learning results for the class of threshold functions over sums of data items. 
We state these learning results using the terminology of our reduction below. 
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Theorem 5.10. Let 

- 1i denote the data universe {0, 1 )^; 

- Q denote the set of query descriptions {0, 1} ; 

- P{q,u) = \ — y j- Uj=o^i denote the monotone conjunction predicate; 

- QQ denote the set of all probability distributions overQ; and 

- QQ]^ denote the set of all probability distributions over Q that are supported on B^ - {q ^ {0, 1}^ : 

qi + ■ ■ ■ + qd ^ k). 

Then 

1. (Learning thresholds of conjunction queries over all inputs) There is an algorithm £. that {y,IS) learns 
thresholds over {Q,gQ, [pu : u e ^{}) using b{n,y,(i) - d^id^'Wognf/^) . o{l/y) ■ log(l/y6) queries to 
an approximate distribution-restricted evaluation oracle for the target n-threshold function (in fact X. 
only uses sampling access to labeled examples). The running time ofX, is poly {b{n,y,/3)). 

2. (Learning thresholds of conjunction queries over sparse inputs) There is an algorithm £. that {y,/3) 
learns thresholds over {Q,gQk,{pu : u € 04]) using b{n,y,l3) = d^^^''^°^"'>"'^ -OilM-logil /p) queries 
to an approximate distribution-restricted evaluation oracle for the target n-threshold function (in fact 
X only uses sampling access to labeled examples). The running time of £. is po\y{b{n,y,PJ). 

Recall from the discussion at the beginning of Section 5 that these learning results, together with our 
reduction, give the private data release results stated at the beginning of the section. 

6 Second Application: Data Release via Fourier-Based Learning 

In this section we present data release algorithms for parity counting queries and AC*' counting queries that 
instantiate our reduction Theorem 3.9 with Fourier-based algorithms from the computational learning theory 
literature. We stress that these algorithms require the more general reduction Theorem 3.9 rather than the 
simpler version of Theorem 1.1 because the underlying learning algorithms ai^e not distribution free. We 
first give our results for parity counting queries in Section 6. 1 and then our results for AC*' counting queries 
in Section 6.2. 

6.1 Parity counting queries using the Harmonic Sieve [Jac97] 

In this subsection we fix the query class under consideration to be the class of parity queries, i.e. we take 
tl = {0, 1^' and Q = {0, 1}'^' and we take P{q, u) - Yji:ui=i 1i (mod 2) to be the parity predicate. Our main 
result for releasing parity counting queries is: 

Theorem 6.1 (Releasing parity counting queries). There is an s-differentially private algorithm for releasing 
the class of parity queries over the uniform distribution on Q which is {a,j5,y)-accurate and has runtime 
poly(«)/or databases of size n, provided that 

^poly(J,l/a,l/r,log(i//?)) 

n ^ . 

e 

This theorem is an immediate consequence of our main reduction. Theorem 3.9, and the following 
learning result: 
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Theorem 6.2. Let 

- 1i denote the data universe {0, 1)''; 

- Q denote the set of query descriptions {0, 1}''; 

- P(q,u) = '^i-Uj=\ <?(• (mod 2) denote the parity predicate; and 

- QQ contains only the uniform distribution over Q. 

Then there is an algorithm X. that {y,l3) learns thresholds over {Q,QQ' , {pi, : u e 'ZY)) where QQ' is the 
(2/y)-smooth extension ofQQ. Algorithm H uses b{n, y,/?) = poly(fif, n, 1/y) • log(l/j6) queries to an approx- 
imate G-restricted evaluation oracle for the target n-threshold function when it is learning with respect to a 
distribution G e QQ' . The running time of £, is poly(Z>(?i, y,IS)). 

Proof. The claimed algorithm £. is essentially Jackson's Harmonic Sieve algorithm [Jac97] for learning 
Majority of Parities; however, a bit of additional analysis of the algorithm is needed as we now explain. 

When Jackson's results on the Harmonic Sieve are expressed in our terminology, they give Theorem 6.2 
exactly as stated above except for one issue which we now describe. Let G' be any distribution in the (2/y)- 
smooth extension QQ' of the uniform distribution. In Jackson's analysis, when it is learning a target function 
/ under distribution G' , the Harmonic Sieve is given black-box oracle access to /, sampling access to the 
distribution G' , and access to a c-approximation to an evaluation oracle for G' , in the following sense: there 
is some fixed constant c € [1/3, 3] such that when the oracle is queried onq eQ, it outputs c-G'[q}. This is a 
formally more powerful type of access to the underlying distribution G' than is allowed in Theorem 6.2 since 
Theorem 6.2 only gives £. access to an approximate G' -restricted evaluation oracle for the target function 
(recall Definition 3.6). To be more precise, the only difference is that with the Sieve's black-box oracle 
access to the target function / it is a priori possible for a learning algorithm to query / even on points where 
the distribution G' puts zero probability mass, whereas such queries are not allowed for £.. Thus to prove 
Theorem 6.2 it suffices to argue that the Harmonic Sieve algorithm, when it is run under distribution G' , 
never needs to make queries on points q eQ that have G'{q^ - 0. 

Fortunately, this is an easy consequence of the way the Harmonic Sieve algorithm works. Instead of 
actually using black-box oracle queries for /, the algorithm actually only ever makes oracle queries to the 
function g{q) - 2^ ■ f{q) ■ D'[q], where D' is a c-approximation to an evaluation oracle for a distribution 
G" which is a smooth extension of G'. (See the discussion in Sections 4.1 and 4.2 of [Jac97], in particular 
Steps 16-18 of the HS algorithm of Figure 4 and Steps 3 and 5 of the WDNF algorithm of Figure 3.) By the 
definition of a smooth extension, if q is such that G'[q] = then G"[q] also equals 0, and consequently 
giq) = as well. Thus it is straightforward to run the Harmonic Sieve using access to an approximate G'- 
restricted evaluation oracle: if G'[q] returns then "0" is the correct value of g{q), and otherwise the oracle 
provides precisely the information that would be available for the Sieve in Jackson's original formulation. 



6.2 AC° queries using [JKS02] 

Fix 'U = {0, 1)'' and Q - {0, 1)''. In this subsection we show that our reduction enables us to do efficient 
private data release for quite a broad class of queries, namely any query computed by a constant-depth 
circuit. 

In more detail, let P{q,u): {0, 1}'' x {0, 1)'' — > {0, 1) be any predicate that is computed by a circuit of 
depth i - 0(1) and size poly (J). Our data release result for such queries is the following: 
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Theorem 6.3 (Releasing AC" queries). Let QQ be the set containing the uniform distribution and let tl, Q, P 
be as described above. There is an s-differentially {^,Q,QQ,P) data release algorithm that is {a,/3,y)- 
accurate and has runtime poly (n) for databases of size n, provided that 

See the introduction for a discussion of this result. We observe that given any fixed P as described 
above, for any given u e tl - {0,1}'^ the function Puig) is computed by a circuit of depth £ and size 
poly(fif) over the input bits qi,.. .,qd- Hence Theorem 6.3 is an immediate consequence of Theorem 3.9 
and the following learning result, which describes the performance guarantee of the quasipolynomial-time 
algorithm of Jackson et al. [JKS02] for learning Majority-of-Parity in our language: 

Theorem 6.4 (Theorem 9 of [JKS02]). Let 

- 1i denote the data universe {0, 1)''; 

- Q denote the set of query descriptions {0, 1) ; 

- P{q,u) be any fixed predicate computed by an AND/OR/NOT circuit of depth £ = 0(1) and size 
poly(c/); 

- ^Q contains only the uniform distribution over Q; and 

- T be the set of all AND/OR/NOT circuits of depth i and size poly(J). 

Then there is an algorithm X that (j,P) learns n-thresholds over (Q, QQ' , f) where QQ' is the {2/y)-smooth 
extension of QQ. Algorithm £. uses approximate distribution restricted oracle access to the function, uses 
b{n,y,P) = t/*^^'°s (™/y)) . log(l/y6) samples and calls to the evaluation oracle, and runs in time t{n,y,^) = 

We note that Theorem 9 of [JKS02], as stated in that paper, only deals with learning majority-of-AC° 
circuits under the uniform distribution: it says that an n-way Majority of depth-^, size-poly((i) circuits 
over {0, 1)'' can be learned to accuracy y and confidence yS under the uniform distribution, using random 
examples only, in time t/'^^'"^ (ndly)) . log(l/y6). However, the boosting-based algorithm of [JKS02] is iden- 
tical in its high-level structure to Jackson's Harmonic Sieve; the only difference is that the [JKS02] weak 
learner simply performs an exhaustive search over all low-weight parity functions to find a weak hypothesis 
that has non-negligible coiTclation with the target, whereas the Harmonic Sieve uses a more sophisticated 
membership-query algorithm (that is an extension of the algorithm of Kushilevitz and Mansour [KM93]). 
Arguments identical to the ones Jackson gives for the Harmonic Sieve (in Section 7.1 of [Jac97]) can be ap- 
plied unchanged to the [JKS02] algorithm, to show that it extends, just like the Harmonic Sieve, to learning 
under smooth distributions if it is provided with an approximate evaluation oracle for the smooth distribu- 
tion. In more detail, these arguments show that for any C-smooth distribution G' , given sampling access 
to labeled examples by {C ,f) (where / is the target «-way Majority of depth-^, size-poly (cf) circuits) and 
approximate evaluation access to G' , the [JKS02] algorithm learns / to accuracy y and confidence yS under 
G' in time J'^^'°§ (C'«rf/y)) . iog(l/^) This is the result that is restated in our data privacy language above (note 
that the smoothness parameter there is C = 2/-y). 

7 Conclusion and open problems 

This work put forward a new reduction from privacy-preserving data analysis to learning thresholds. Instan- 
tiating this reduction with various different learning algorithms, we obtained new data release algorithms for 
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a vaiiety of query classes. One notable improvement was for the database size (or error) in distribution-free 
release of conjunctions and k-wa.y conjunctions. Given these new results, we see no known obstacles for 
even more dramatic improvements on this central question. In particular, we conclude with the following 
open question. 

Open Question 7,1. Is there a differentially private distribution-free data release algorithm (with constant 
eiTor, e.g., a = 1/100) for conjunctions or ^-way conjunctions that works for databases of size poly{d) and 
runs in time poly(«) (or poly(n, d'^) for the case of k-wa.y conjunctions)? 

Note that such an algorithm for k-v/a.y conjunctions would also imply, via boosting [DRVIO], that we 
can privately release all k-way conjunctions in time poly(«, d'^), provided that \D\ ^ poly(cf). 
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